Data description
We were provided with the following data:
- Saliva sample OTU count table: 294 samples across 1726 OTUs
- Subject ID: 65 subjects
- Subject type: wild type (WT) or knock-out
(KO)
- Subject sex: M or F
- Subject diagnosis: 1, 2, or 3
- Sample time (in weeks): 0, 4, 8, 12, 16, 22, where not all subject
were measured at all time points
- Taxonomic information for each OTU
OTU counts were aggregated at three different
taxonomic levels: genus (169 taxa), family (106 taxa)
and order (55 taxa). OTU counts were transformed using the
centered log-ratio (clr) transformation where
zero counts were replaced by a pseudo count of 0.5.
Goals
- Identify, if any, which taxa are differentially
abundant (DA) between the two groups
- Identify, if any, the time points at which the two
groups are differentially abundant
Model description
Notation
- \(i\) indexes subjects
- \(x_i=0/1\) encodes subject type
(1=KO)
- \(u_i=0/1\) encodes subject sex
(1=F)
- \(y_{ij}(t)\) is the
clr-transformed count for taxa \(j\) at
time \(t\)
Fixed effect I: varying-coefficient model
We assume that each group has its own mean trajectory function. To
emphasize the DA problem, we opt for the following parameterization:
- The WT group has mean function \(\beta_0(t)\)
- The KO group has mean function \(\beta_0(t)+\beta_1(t)\)
In particular, \(\beta_1(t)\) is the
quantity of interest: when \(\beta_1(t)=0\), then we can conclude that
the taxa under consideration is not DA at time \(t\).
We can specify the mean for subject \(i\) as
\[
\beta_0(t) + x_i\beta_1(t)
\]
Fixed effect II: non-varying effect
To adjust for potential variability across sex, we include \(u_i\) as a constant effect:
\[
\beta_0(t) + x_i\beta_1(t) + u_i\alpha
\]
Random effect: subject-specific random intercept
To account for longitudinal correlation between samples of a given
subject, we include a random intercept \(\gamma_i\) that is taken to be constant
across time. The final mean is given by the conditional mean
\[
\mu_{ij}(t):=\mathbb E\{y_{ij}(t) \mid \gamma_i \}
=
\beta_{0j}(t) + x_i\beta_{1j}(t) + u_i\alpha_j + \gamma_i
\]
with a homoskedastic and conditional independence (given random
intercept) assumption across subjects and time points:
\[
y_{ij}(t) \mid \gamma_i \sim \mathcal N(\mu_{ij}(t), \sigma_j^2)
\]
Overview of the estimation procedure
Our proposed method combines the following elements:
- Varying coefficients to model longitudinal relationships along
covariates (here: two groups)
- Constant effects (here: sex)
- Random effects (here: random intercept)
- Sparse estimation (here: to encourage 0s in \(\beta_1(t)\))
In particular, it enables:
- Smooth estimated curves across time
- Local (& global) differential analysis
- Adjustment for other covariates
- Adjustment for dependence across samples
- Irregular sampling time
Intuition
In contrast to performing DA analysis at each time point
idenpendently, our proposed approach can potentially borrow strength
across adjacent time points. Conversely, smoothing neighboring time
points can potentially avoid some false positive. This interpretation
relies on the underlying assumption that abundance varies
smoothly with time.
Pipeline
For each taxa \(j\):
- Fit the model across a grid of tuning parameters (smoothness and
sparsity)
- Tuning parameter selection using an information criterion
- Bootstrapping to obtain uncertainty quantification
- Conclude to local DA whenever the pointwise simultaneous confidence
band excludes 0
Results
We select all taxa for which we identified local DA or
cross-sectionally DA at at least one week. Blue cells indicate a
log-fold increase from WT to KO; red cells indicate a log-fold decrease
from WT to KO. Cells with an asterix (*) indicate that our method
identified local DA; cells with grey stripes indicate that the
cross-sectional test indentifies local DA.
Genus level

Family level

Order level

Taxa-level results
For each taxa previously reported, we produce a plot with three
components:
- A boxplot of the observed clr-transformed counts stratified by time
and type (note that some outliers prevented a reasonable display, so
values were trimmed)
- The estimated difference between the two groups, where positive
values represent a log-fold increase from WT to KO. The plot also
contains the bootstrap samples (faint lines) as well as pointwise
simultaneous confidence intervals. Black intervals represent non-DA; red
intervals represent local DA.
- The results for DESeq2, a Negative Binomial test. We display the
estimated log-fold change. If the time point is selected, then it is
displayed in red. For a somewhat fair comparison, we perform a
Bonferroni correction within taxa, across time (i.e., times 6).